Skip to content

fix(quickstart): make the one-command path actually reach first fork#243

Merged
WaylandYang merged 2 commits into
mainfrom
fix/quickstart-e2e
Jun 13, 2026
Merged

fix(quickstart): make the one-command path actually reach first fork#243
WaylandYang merged 2 commits into
mainfrom
fix/quickstart-e2e

Conversation

@WaylandYang

Copy link
Copy Markdown
Contributor

Context

Follows up on #240's open promise: "E2E on a live host: pending — will run before tagging the next release." Done — and it found six blocking bugs, all in the first-run path a new user crosses.

E2E ran inside a --privileged --device /dev/kvm Docker container on the dev box, so host networking was untouched (tap/netns/iptables all live in the container's own netns — same isolation model FC's own devtool uses).

Six bugs, six fixes

# Bug Symptom
1 install-guest-kernel.sh hardcodes sudo sudo: command not found in minimal/container envs running as root
2 file/strings assumed present file: command not found after a successful kernel install
3 $USER unbound under set -u (host-tap + netns-setup) USER: unbound variable in docker exec/non-login shells
4 unpack bakes packing-host absolute paths into snapshot.json FC Failed to open snapshot file: No such file or directory on first restore
5 build-rootfs.sh hardcodes sudo (18 sites) sudo: command not found mid-bake
6 tarball installs can't find scripts/+rootfs-init/ guest boots with no init → Kernel panic … init /forkd-init.sh failed (error -2) → FC exits → pause fails

Plus a UX fix: doctor::preflight() softens the three non-gating rows (kernel/tap/docker) from red ✗ to ⚠ — a ✗ immediately followed by "quickstart sets this up" read as a contradiction.

Fixes 1–3, 5 use a uniform privilege shim ($(id -u)==0 ? "" : "sudo"). Fix 4 rewrites vmstate/memory paths in hub::unpack + re-runs post-rename (v1 and v2 chain paths) via serde_json::Value so unknown fields survive — 2 unit tests. Fix 6 stages embedded scripts as <base>/scripts/* with init files in the sibling <base>/rootfs-init/.

Clean E2E transcript (tail)

  forkd snapshot --tag python --kernel <vmlinux> --rootfs /var/cache/forkd/python-3-12-slim.ext4
  forkd fork --tag python -n 100
✓ wrote /var/cache/forkd/python-3-12-slim.ext4
  next: forkd snapshot --tag <name> --kernel <vmlinux> --rootfs /var/cache/forkd/python-3-12-slim.ext4 --tap forkd-tap0
==> snapshot --tag quickstart
    rootfs mode: read-write (ext4)
    network: virtio-net via tap forkd-tap0 (guest 10.42.0.2 ↔ host 10.42.0.1)
==> booting parent VM (work_dir=/tmp/forkd-parent-quickstart)...
    firecracker pid: 4776
==> warming up for 10s...
==> pausing...
==> snapshotting to /root/.local/share/forkd/snapshots/quickstart...
    snapshot took 5346 ms
    cleaned work_dir /tmp/forkd-parent-quickstart
✓ tag 'quickstart' ready. Try: forkd fork --tag quickstart --n 10
✓ snapshot quickstart ready.
  next: sudo -E forkd fork --tag quickstart -n N --per-child-netns
==> forking 4 children from 'quickstart'
==> forking 4 children from snapshot 'quickstart' (per-child netns)...
✓ all sockets up in 52 ms
✓ 4 restores fired in parallel in 7 ms
✓ total wall-clock: 59 ms
==> letting children settle for 2s...
✓ 4 / 4 children alive
==> shutting down...
    cleaned work_dir /tmp/forkd-fork-quickstart
✓ quickstart complete — 4 microVMs forked from one warm snapshot.
  where to go next:
    sudo -E forkd fork --tag quickstart -n 100 --per-child-netns    # go wider
    sudo -E forkd exec --child forkd-child-1 -- python3 -c 'print("hi from a fork")'
    forkd images                                # snapshots on disk
    https://github.com/deeplethe/forkd/tree/main/recipes        # CI fan-out, DB fixtures, agent sandboxes
exit=0

59 ms wall-clock for 4 children off a freshly-baked snapshot, 4/4 alive.

Scope note

The hub-pull fallback surfaced a separate, deeper bug — snapshots aren't portable across hosts (rootfs absolute path baked in the vmstate + rootfs not shipped in packs). Filed as #242. quickstart's preferred local-bake path is unaffected and already warns before the hub fallback.

Test plan

  • Full clean E2E (Docker-bake route) in isolated container
  • cargo clippy --all-targets -D warnings + cargo fmt --check clean
  • cargo test -p forkd-cli hub:: 12/12 (incl. 2 new path-rewrite tests)
  • bash -n on all 4 modified scripts
  • CI green

🤖 Generated with Claude Code

WaylandYang and others added 2 commits June 13, 2026 10:09
E2E'd `forkd quickstart` end-to-end in an isolated privileged container
(host networking untouched). It crashed at six distinct points before a
clean run — every one a wall a new user hits in their first minute:

1. **install-guest-kernel.sh hardcodes `sudo`** — absent in minimal
   environments (containers, slim images) where quickstart already runs
   as root. Added a `$(id -u)==0 ? "" : "sudo"` shim.
2. **`file`/`strings` assumed present** in the same script's post-install
   descriptor lines — guarded with `command -v`.
3. **`$USER` unbound under `set -u`** in host-tap.sh + netns-setup.sh
   (unset in `docker exec` / non-login shells). Fall back through
   `id -un`.
4. **unpack baked the packing host's absolute paths** into snapshot.json,
   so the first restore on any other machine failed with FC's "Failed to
   open snapshot file". `hub::unpack` now rewrites `vmstate`/`memory` to
   the extraction dir; main.rs re-runs the rewrite AFTER the staging→final
   `rename(2)` (v1 and v2 chain paths) since the in-unpack pass points at
   the soon-stale staging dir. Operates on `serde_json::Value` so unknown
   fields and volume paths survive. Two unit tests.
5. **build-rootfs.sh hardcodes `sudo`** at 18 sites — same shim.
6. **tarball installs can't find `scripts/` or `rootfs-init/`** — the
   `from-image` bake shells out to build-rootfs.sh, which finds the guest
   init+agent via `$(dirname $0)/../rootfs-init`. quickstart now stages
   the embedded scripts as `<base>/scripts/*` with the init files in the
   sibling `<base>/rootfs-init/` and points FORKD_SCRIPTS_DIR there.
   Without this the guest booted with no init → `Kernel panic … init
   /forkd-init.sh failed (error -2)` → FC exits → pause fails.

Also: doctor::preflight() softens the three non-gating rows (kernel/tap/
docker) from ✗ to ⚠ — a red ✗ immediately followed by "quickstart sets
this up" read as a contradiction.

Clean run now: preflight 7✓ → consent heal → docker bake (5.3 s snapshot)
→ fork 4 children in 59 ms wall-clock, 4/4 alive. Transcript saved.

Note: the hub-pull fallback path surfaced a separate, deeper portability
bug (snapshots aren't relocatable across hosts — rootfs absolute path in
the vmstate + rootfs not shipped in packs); filed as #242. quickstart's
preferred local-bake path is unaffected and prints a warning before the
hub fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@WaylandYang WaylandYang merged commit b1ab5fa into main Jun 13, 2026
2 checks passed
@WaylandYang WaylandYang deleted the fix/quickstart-e2e branch June 13, 2026 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant